Skip to content

feat(supervisor): schedule-tree node affinity#3271

Merged
myftija merged 4 commits intomainfrom
schedule-affinity
Mar 25, 2026
Merged

feat(supervisor): schedule-tree node affinity#3271
myftija merged 4 commits intomainfrom
schedule-affinity

Conversation

@myftija
Copy link
Collaborator

@myftija myftija commented Mar 25, 2026

Scheduled runs create predictable hourly spikes that compete with on-demand runs for node capacity. Runs triggered "on-demand" via the SDK, API, or dashboard, are more sensitive to cold start latency since users are typically
waiting on the result. When a burst of scheduled runs lands at the top of the hour, it can saturate the shared pool resources causing contention, affecting cold starts across the board.

The idea in this change is to absorb these periodic spikes in a dedicated pool without affecting the cold starts of on-demand runs. Scheduled runs are inherently less sensitive to cold starts.

Changes in this PR

Follows up on run annotations (#3241), which made trigger origin available on every run in the tree. This PR exposes
annotations at dequeue time to the supervisor. This enables scheduling decisions based on trigger source.

The affinities are soft preferences at schedule time, so runs fall back gracefully if the target pool is out out of capacity.

@changeset-bot
Copy link

changeset-bot bot commented Mar 25, 2026

⚠️ No Changeset found

Latest commit: 9edddfd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

Adds structured run annotations (new Zod schemas TriggerSource, TriggerAction, RunAnnotations) and threads annotations through the dequeue path into workload manager create calls. Replaces a single optional large-machine env var with explicit affinity configuration (enable flags, label key/value, bounded weights) and expands schedule-affinity env vars with weights/anti-weights. Refactors Kubernetes pod labeling and node affinity generation to merge large-machine and schedule-aware soft preferences, adds a scheduled pod label, and updates the Zod env schema accordingly. Also removes duplicate RunAnnotations definitions from the API schema barrel.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description provides clear motivation and context but does not follow the required template structure with sections like Testing, Changelog, and completed Checklist. Complete the description template by adding Testing section (test steps), Changelog section (summary of changes), Checklist items (mark as done), and issue reference (Closes #XXXX).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely captures the main change—introducing schedule-aware node affinity in the supervisor to handle scheduled run scheduling decisions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch schedule-affinity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/supervisor/src/workloadManager/types.ts (1)

20-39: Prefer a type alias for WorkloadManagerCreateOptions.

This contract is already being edited here, and the repo standard is type over interface in TypeScript.

♻️ Proposed refactor
-export interface WorkloadManagerCreateOptions {
+export type WorkloadManagerCreateOptions = {
   image: string;
   machine: MachinePreset;
   version: string;
   nextAttemptNumber?: number;
   dequeuedAt: Date;
   placementTags?: PlacementTag[];
   // identifiers
   envId: string;
   envType: EnvironmentType;
   orgId: string;
   projectId: string;
   deploymentFriendlyId: string;
   deploymentVersion: string;
   runId: string;
   runFriendlyId: string;
   snapshotId: string;
   snapshotFriendlyId: string;
   annotations?: RunAnnotations;
-}
+};

If the interface form is intentional for extension or declaration merging, please confirm that here.

As per coding guidelines, Use types over interfaces for TypeScript.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/supervisor/src/workloadManager/types.ts` around lines 20 - 39, Replace
the exported interface WorkloadManagerCreateOptions with an exported type alias
of the same name (WorkloadManagerCreateOptions = { ... }) to follow the repo
preference for type over interface; update the declaration around the properties
(image, machine, version, nextAttemptNumber, dequeuedAt, placementTags, envId,
envType, orgId, projectId, deploymentFriendlyId, deploymentVersion, runId,
runFriendlyId, snapshotId, snapshotFriendlyId, annotations) and keep all
property names and optionals unchanged, and if WorkloadManagerCreateOptions is
intentionally used for extension or declaration merging, confirm and document
that reason instead of converting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/supervisor/src/env.ts`:
- Around line 113-128: The new pool label key env vars
(KUBERNETES_LARGE_MACHINE_AFFINITY_POOL_LABEL_KEY and
KUBERNETES_SCHEDULE_AFFINITY_POOL_LABEL_KEY) currently allow empty or
whitespace-only values; change their schema from z.string().default(...) to
z.string().trim().min(1).default(...) so blank/whitespace-only values are
rejected at startup, matching the stricter validation used by
KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY.

In `@apps/supervisor/src/workloadManager/kubernetes.ts`:
- Around line 402-416: The large-machine preferred affinity is still applied for
scheduled runs causing them to prefer the large pool; update the logic so
`#getNodeAffinityRules` accepts an isScheduledRun flag (or call it with
this.#isScheduledRun(opts)) and have it suppress/omit the
preferredDuringSchedulingIgnoredDuringExecution large-* preference when
isScheduledRun is true while still returning the
requiredDuringSchedulingIgnoredDuringExecution exclusion term; update calls in
`#getAffinity` (and the other similar call sites referenced) to pass the
scheduled-run boolean and merge affinities as before.

---

Nitpick comments:
In `@apps/supervisor/src/workloadManager/types.ts`:
- Around line 20-39: Replace the exported interface WorkloadManagerCreateOptions
with an exported type alias of the same name (WorkloadManagerCreateOptions = {
... }) to follow the repo preference for type over interface; update the
declaration around the properties (image, machine, version, nextAttemptNumber,
dequeuedAt, placementTags, envId, envType, orgId, projectId,
deploymentFriendlyId, deploymentVersion, runId, runFriendlyId, snapshotId,
snapshotFriendlyId, annotations) and keep all property names and optionals
unchanged, and if WorkloadManagerCreateOptions is intentionally used for
extension or declaration merging, confirm and document that reason instead of
converting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4269492e-bb96-4874-b4e9-c2f629a9a683

📥 Commits

Reviewing files that changed from the base of the PR and between 2037254 and b085a41.

📒 Files selected for processing (6)
  • apps/supervisor/src/env.ts
  • apps/supervisor/src/index.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/workloadManager/types.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: typecheck / typecheck
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: For apps and internal packages (apps/*, internal-packages/*), use pnpm run typecheck --filter <package> for verification, never use build as it proves almost nothing about correctness
Use testcontainers helpers (redisTest, postgresTest, containerTest from @internal/testcontainers) for integration tests with Redis and PostgreSQL instead of mocking
When writing Trigger.dev tasks, always import from @trigger.dev/sdk - never use @trigger.dev/sdk/v3 or deprecated client.defineJob

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Use pnpm for package management in this monorepo (version 10.23.0) with Turborepo for orchestration - run commands from root with pnpm run
Add crumbs as you write code for debug tracing using // @Crumbs comments or `// `#region` `@crumbs blocks - they stay on the branch throughout development and are stripped via agentcrumbs strip before merge

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
apps/supervisor/src/workloadManager/**/*.{js,ts}

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

Container orchestration abstraction (Docker or Kubernetes) should be implemented in src/workloadManager/

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
apps/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset

Files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
internal-packages/run-engine/src/engine/systems/**/*.ts

📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)

Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Files:

  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/src/v3/schemas/runEngine.ts
packages/core/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (packages/core/CLAUDE.md)

Never import the root package (@trigger.dev/core). Always use subpath imports such as @trigger.dev/core/v3, @trigger.dev/core/v3/utils, @trigger.dev/core/logger, or @trigger.dev/core/schemas

Files:

  • packages/core/src/v3/schemas/runEngine.ts
packages/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/**/*.{ts,tsx,js,jsx}: For public packages (packages/*), use pnpm run build --filter <package> for verification
Add a changeset via pnpm run changeset:add when modifying any public package (packages/* or integrations/*) - default to patch for bug fixes and minor changes

Files:

  • packages/core/src/v3/schemas/runEngine.ts
packages/core/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

In the core package (@trigger.dev/core), import subpaths only - never import from root

Files:

  • packages/core/src/v3/schemas/runEngine.ts
apps/supervisor/src/env.ts

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

Environment configuration should be defined in src/env.ts

Files:

  • apps/supervisor/src/env.ts
🧠 Learnings (20)
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/workloadManager/**/*.{js,ts} : Container orchestration abstraction (Docker or Kubernetes) should be implemented in `src/workloadManager/`

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Specify CPU/RAM requirements using the `machine` option with a `preset` value (e.g., 'large-1x')

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Set default machine type using `defaultMachine` option in `trigger.config.ts`

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `task()` from `trigger.dev/sdk` for basic task definitions with `id` and `run` properties

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Applies to **/*.{ts,tsx} : When writing Trigger.dev tasks, always import from `trigger.dev/sdk` - never use `trigger.dev/sdk/v3` or deprecated `client.defineJob`

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Import from `trigger.dev/sdk` (NEVER from `trigger.dev/sdk/v3`)

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Configure the Trigger.dev project using `defineConfig()` with properties like `project`, `dirs`, `retries`, `runtime`, and `build`

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Configure telemetry instrumentations and exporters in the `telemetry` option

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/supervisor/src/workloadManager/types.ts
  • apps/supervisor/src/index.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/systems/**/*.ts : Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Applied to files:

  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Use Run Engine 2.0 (`internal/run-engine`) and redis-worker for all new work instead of legacy V1 engine code

Applied to files:

  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
📚 Learning: 2026-03-02T12:43:43.173Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/redis-worker/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:43.173Z
Learning: Applies to packages/redis-worker/**/redis-worker/src/queue.ts : Job queue abstraction should be Redis-backed in src/queue.ts

Applied to files:

  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/tests/**/*.test.ts : Implement tests for RunEngine in `src/engine/tests/` using testcontainers for Redis and PostgreSQL containerization

Applied to files:

  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Applies to apps/webapp/app/v3/**/*.{ts,tsx} : In the webapp v3 directory, only modify V2 code paths when encountering V1/V2 branching in services - all new work uses Run Engine 2.0 (`internal/run-engine`) and redis-worker, not legacy V1 engine code

Applied to files:

  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk` with a Zod schema for payload validation

Applied to files:

  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `task.trigger()` to trigger a single run of a task from inside another task

Applied to files:

  • packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled (cron) tasks with either declarative cron patterns or imperative schedule creation

Applied to files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`

Applied to files:

  • apps/supervisor/src/env.ts
🔇 Additional comments (2)
packages/core/src/v3/schemas/runEngine.ts (1)

255-264: Nice additive schema update.

Making run.annotations optional keeps existing dequeue payloads valid while exposing the structured data the supervisor needs for placement decisions.

apps/supervisor/src/index.ts (1)

253-271: Nice boundary here.

Forwarding message.run.annotations into workloadManager.create() keeps Kubernetes-specific scheduling policy inside src/workloadManager/.

Based on learnings: Container orchestration abstraction (Docker or Kubernetes) should be implemented in src/workloadManager/.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
apps/supervisor/src/workloadManager/kubernetes.ts (1)

402-416: ⚠️ Potential issue | 🟠 Major

Scheduled large-* runs still prefer the large-machine pool.

When both affinity features are enabled, #getAffinity() combines two mutually exclusive preferred terms on the same default machinepool key. With the current defaults (100 for large-machine affinity vs 80 for schedule affinity), a scheduled large-* run still scores the large-machine pool highest, so it keeps landing there instead of the dedicated schedule pool. Please suppress the large-machine preferred term for scheduled runs and let the schedule preference drive placement. A focused regression test for large-* + scheduled + both affinity flags enabled would lock this down.

🎯 Possible fix
   `#getAffinity`(opts: WorkloadManagerCreateOptions): k8s.V1Affinity | undefined {
-    const largeNodeAffinity = this.#getNodeAffinityRules(opts.machine);
-    const scheduleNodeAffinity = this.#getScheduleNodeAffinityRules(this.#isScheduledRun(opts));
+    const isScheduledRun = this.#isScheduledRun(opts);
+    const largeNodeAffinity = this.#getNodeAffinityRules(opts.machine, isScheduledRun);
+    const scheduleNodeAffinity = this.#getScheduleNodeAffinityRules(isScheduledRun);
     const podAffinity = this.#getProjectPodAffinity(opts.projectId);
@@
-  `#getNodeAffinityRules`(preset: MachinePreset): k8s.V1NodeAffinity | undefined {
+  `#getNodeAffinityRules`(
+    preset: MachinePreset,
+    isScheduledRun: boolean
+  ): k8s.V1NodeAffinity | undefined {
     if (!env.KUBERNETES_LARGE_MACHINE_AFFINITY_ENABLED) {
       return undefined;
     }
 
     if (this.#isLargeMachine(preset)) {
+      if (isScheduledRun) {
+        return undefined;
+      }
+
       // soft preference for the large-machine pool, falls back to standard if unavailable
       return {

Also applies to: 437-459, 480-521

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/supervisor/src/workloadManager/kubernetes.ts` around lines 402 - 416,
The combined affinity merge is allowing large-machine preferred terms to compete
with schedule preferred terms, so in `#getAffinity` when both largeNodeAffinity
(from `#getNodeAffinityRules`) and scheduleNodeAffinity (from
`#getScheduleNodeAffinityRules`) exist for a scheduled run
(this.#isScheduledRun(opts) true), filter out the
largeNodeAffinity.preferredDuringSchedulingIgnoredDuringExecution entries that
target the default "machinepool" key before building the preferred array so
schedule affinity wins; implement this by excluding those preferred entries
(inspect their nodeSelectorTerm.matchExpressions for key "machinepool" or the
project’s machinepool key) when scheduleNodeAffinity is present and add a
focused regression test covering a large-* run that is scheduled with both
affinity flags enabled to ensure the large preferred term is suppressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@apps/supervisor/src/workloadManager/kubernetes.ts`:
- Around line 402-416: The combined affinity merge is allowing large-machine
preferred terms to compete with schedule preferred terms, so in `#getAffinity`
when both largeNodeAffinity (from `#getNodeAffinityRules`) and
scheduleNodeAffinity (from `#getScheduleNodeAffinityRules`) exist for a scheduled
run (this.#isScheduledRun(opts) true), filter out the
largeNodeAffinity.preferredDuringSchedulingIgnoredDuringExecution entries that
target the default "machinepool" key before building the preferred array so
schedule affinity wins; implement this by excluding those preferred entries
(inspect their nodeSelectorTerm.matchExpressions for key "machinepool" or the
project’s machinepool key) when scheduleNodeAffinity is present and add a
focused regression test covering a large-* run that is scheduled with both
affinity flags enabled to ensure the large preferred term is suppressed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 564b823b-2f7b-4ce4-8411-d868e9316dd9

📥 Commits

Reviewing files that changed from the base of the PR and between ea78788 and 9edddfd.

📒 Files selected for processing (2)
  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: For apps and internal packages (apps/*, internal-packages/*), use pnpm run typecheck --filter <package> for verification, never use build as it proves almost nothing about correctness
Use testcontainers helpers (redisTest, postgresTest, containerTest from @internal/testcontainers) for integration tests with Redis and PostgreSQL instead of mocking
When writing Trigger.dev tasks, always import from @trigger.dev/sdk - never use @trigger.dev/sdk/v3 or deprecated client.defineJob

Files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Use pnpm for package management in this monorepo (version 10.23.0) with Turborepo for orchestration - run commands from root with pnpm run
Add crumbs as you write code for debug tracing using // @Crumbs comments or `// `#region` `@crumbs blocks - they stay on the branch throughout development and are stripped via agentcrumbs strip before merge

Files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
apps/supervisor/src/env.ts

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

Environment configuration should be defined in src/env.ts

Files:

  • apps/supervisor/src/env.ts
apps/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset

Files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
apps/supervisor/src/workloadManager/**/*.{js,ts}

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

Container orchestration abstraction (Docker or Kubernetes) should be implemented in src/workloadManager/

Files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
🧠 Learnings (7)
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`

Applied to files:

  • apps/supervisor/src/env.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/workloadManager/**/*.{js,ts} : Container orchestration abstraction (Docker or Kubernetes) should be implemented in `src/workloadManager/`

Applied to files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-22T19:27:29.014Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts:104-112
Timestamp: 2026-03-22T19:27:29.014Z
Learning: In `apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts`, the `#scheduleErrorAlertEvaluation` helper intentionally uses the same job id (`evaluateErrorAlerts:${projectId}`) as the evaluator's periodic self-chain. The deduplication is desired: if a future run is already queued, the immediate enqueue becomes a no-op, preventing two evaluations firing in quick succession. Do not flag this as a bug or suggest a unique/timestamped id.

Applied to files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled (cron) tasks with either declarative cron patterns or imperative schedule creation

Applied to files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Specify CPU/RAM requirements using the `machine` option with a `preset` value (e.g., 'large-1x')

Applied to files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
🔇 Additional comments (2)
apps/supervisor/src/workloadManager/kubernetes.ts (1)

349-353: Good call keeping the new scheduled label binary.

This gives the affinity logic the routing signal it needs while keeping the added label dimension bounded.

apps/supervisor/src/env.ts (1)

113-117: No changes needed to env.ts configuration.

The redesign from KUBERNETES_LARGE_MACHINE_POOL_LABEL to the new KUBERNETES_LARGE_MACHINE_AFFINITY_* structure is actually safer for rollout: both the large-machine and schedule affinity features default to ENABLED=false, so existing deployments are unaffected unless explicitly enabled. The structured configuration (key, value, weight, enabled flag) is more explicit than the single-variable predecessor and all values are properly validated. The helm chart supports passing these via extraEnvVars if custom configuration is needed.

			> Likely an incorrect or invalid review comment.

@myftija myftija merged commit 97d2f72 into main Mar 25, 2026
48 checks passed
@myftija myftija deleted the schedule-affinity branch March 25, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants