feat(supervisor): schedule-tree node affinity#3271
Conversation
|
WalkthroughAdds structured run annotations (new Zod schemas TriggerSource, TriggerAction, RunAnnotations) and threads annotations through the dequeue path into workload manager create calls. Replaces a single optional large-machine env var with explicit affinity configuration (enable flags, label key/value, bounded weights) and expands schedule-affinity env vars with weights/anti-weights. Refactors Kubernetes pod labeling and node affinity generation to merge large-machine and schedule-aware soft preferences, adds a Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
apps/supervisor/src/workloadManager/types.ts (1)
20-39: Prefer a type alias forWorkloadManagerCreateOptions.This contract is already being edited here, and the repo standard is
typeoverinterfacein TypeScript.♻️ Proposed refactor
-export interface WorkloadManagerCreateOptions { +export type WorkloadManagerCreateOptions = { image: string; machine: MachinePreset; version: string; nextAttemptNumber?: number; dequeuedAt: Date; placementTags?: PlacementTag[]; // identifiers envId: string; envType: EnvironmentType; orgId: string; projectId: string; deploymentFriendlyId: string; deploymentVersion: string; runId: string; runFriendlyId: string; snapshotId: string; snapshotFriendlyId: string; annotations?: RunAnnotations; -} +};If the
interfaceform is intentional for extension or declaration merging, please confirm that here.As per coding guidelines,
Use types over interfaces for TypeScript.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/supervisor/src/workloadManager/types.ts` around lines 20 - 39, Replace the exported interface WorkloadManagerCreateOptions with an exported type alias of the same name (WorkloadManagerCreateOptions = { ... }) to follow the repo preference for type over interface; update the declaration around the properties (image, machine, version, nextAttemptNumber, dequeuedAt, placementTags, envId, envType, orgId, projectId, deploymentFriendlyId, deploymentVersion, runId, runFriendlyId, snapshotId, snapshotFriendlyId, annotations) and keep all property names and optionals unchanged, and if WorkloadManagerCreateOptions is intentionally used for extension or declaration merging, confirm and document that reason instead of converting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/supervisor/src/env.ts`:
- Around line 113-128: The new pool label key env vars
(KUBERNETES_LARGE_MACHINE_AFFINITY_POOL_LABEL_KEY and
KUBERNETES_SCHEDULE_AFFINITY_POOL_LABEL_KEY) currently allow empty or
whitespace-only values; change their schema from z.string().default(...) to
z.string().trim().min(1).default(...) so blank/whitespace-only values are
rejected at startup, matching the stricter validation used by
KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY.
In `@apps/supervisor/src/workloadManager/kubernetes.ts`:
- Around line 402-416: The large-machine preferred affinity is still applied for
scheduled runs causing them to prefer the large pool; update the logic so
`#getNodeAffinityRules` accepts an isScheduledRun flag (or call it with
this.#isScheduledRun(opts)) and have it suppress/omit the
preferredDuringSchedulingIgnoredDuringExecution large-* preference when
isScheduledRun is true while still returning the
requiredDuringSchedulingIgnoredDuringExecution exclusion term; update calls in
`#getAffinity` (and the other similar call sites referenced) to pass the
scheduled-run boolean and merge affinities as before.
---
Nitpick comments:
In `@apps/supervisor/src/workloadManager/types.ts`:
- Around line 20-39: Replace the exported interface WorkloadManagerCreateOptions
with an exported type alias of the same name (WorkloadManagerCreateOptions = {
... }) to follow the repo preference for type over interface; update the
declaration around the properties (image, machine, version, nextAttemptNumber,
dequeuedAt, placementTags, envId, envType, orgId, projectId,
deploymentFriendlyId, deploymentVersion, runId, runFriendlyId, snapshotId,
snapshotFriendlyId, annotations) and keep all property names and optionals
unchanged, and if WorkloadManagerCreateOptions is intentionally used for
extension or declaration merging, confirm and document that reason instead of
converting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4269492e-bb96-4874-b4e9-c2f629a9a683
📒 Files selected for processing (6)
apps/supervisor/src/env.tsapps/supervisor/src/index.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/workloadManager/types.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: typecheck / typecheck
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: For apps and internal packages (apps/*,internal-packages/*), usepnpm run typecheck --filter <package>for verification, never usebuildas it proves almost nothing about correctness
Use testcontainers helpers (redisTest,postgresTest,containerTestfrom@internal/testcontainers) for integration tests with Redis and PostgreSQL instead of mocking
When writing Trigger.dev tasks, always import from@trigger.dev/sdk- never use@trigger.dev/sdk/v3or deprecatedclient.defineJob
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
**/*.{ts,tsx,js,jsx}: Use pnpm for package management in this monorepo (version 10.23.0) with Turborepo for orchestration - run commands from root withpnpm run
Add crumbs as you write code for debug tracing using//@Crumbscomments or `// `#region` `@crumbsblocks - they stay on the branch throughout development and are stripped viaagentcrumbs stripbefore merge
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
apps/supervisor/src/workloadManager/**/*.{js,ts}
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Container orchestration abstraction (Docker or Kubernetes) should be implemented in
src/workloadManager/
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/workloadManager/kubernetes.ts
apps/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
When modifying only server components (
apps/webapp/,apps/supervisor/, etc.) with no package changes, add a.server-changes/file instead of a changeset
Files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
internal-packages/run-engine/src/engine/systems/**/*.ts
📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)
Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability
Files:
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
packages/core/src/v3/schemas/runEngine.ts
packages/core/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (packages/core/CLAUDE.md)
Never import the root package (
@trigger.dev/core). Always use subpath imports such as@trigger.dev/core/v3,@trigger.dev/core/v3/utils,@trigger.dev/core/logger, or@trigger.dev/core/schemas
Files:
packages/core/src/v3/schemas/runEngine.ts
packages/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/**/*.{ts,tsx,js,jsx}: For public packages (packages/*), usepnpm run build --filter <package>for verification
Add a changeset viapnpm run changeset:addwhen modifying any public package (packages/*orintegrations/*) - default to patch for bug fixes and minor changes
Files:
packages/core/src/v3/schemas/runEngine.ts
packages/core/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
In the core package (
@trigger.dev/core), import subpaths only - never import from root
Files:
packages/core/src/v3/schemas/runEngine.ts
apps/supervisor/src/env.ts
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Environment configuration should be defined in
src/env.ts
Files:
apps/supervisor/src/env.ts
🧠 Learnings (20)
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/workloadManager/**/*.{js,ts} : Container orchestration abstraction (Docker or Kubernetes) should be implemented in `src/workloadManager/`
Applied to files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsapps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Specify CPU/RAM requirements using the `machine` option with a `preset` value (e.g., 'large-1x')
Applied to files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Set default machine type using `defaultMachine` option in `trigger.config.ts`
Applied to files:
apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `task()` from `trigger.dev/sdk` for basic task definitions with `id` and `run` properties
Applied to files:
apps/supervisor/src/workloadManager/types.tspackages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Applies to **/*.{ts,tsx} : When writing Trigger.dev tasks, always import from `trigger.dev/sdk` - never use `trigger.dev/sdk/v3` or deprecated `client.defineJob`
Applied to files:
apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Import from `trigger.dev/sdk` (NEVER from `trigger.dev/sdk/v3`)
Applied to files:
apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Configure the Trigger.dev project using `defineConfig()` with properties like `project`, `dirs`, `retries`, `runtime`, and `build`
Applied to files:
apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to trigger.config.ts : Configure telemetry instrumentations and exporters in the `telemetry` option
Applied to files:
apps/supervisor/src/workloadManager/types.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/supervisor/src/workloadManager/types.tsapps/supervisor/src/index.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.tsapps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/systems/**/*.ts : Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability
Applied to files:
internal-packages/run-engine/src/engine/systems/dequeueSystem.tspackages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Use Run Engine 2.0 (`internal/run-engine`) and redis-worker for all new work instead of legacy V1 engine code
Applied to files:
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
📚 Learning: 2026-03-02T12:43:43.173Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/redis-worker/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:43.173Z
Learning: Applies to packages/redis-worker/**/redis-worker/src/queue.ts : Job queue abstraction should be Redis-backed in src/queue.ts
Applied to files:
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/tests/**/*.test.ts : Implement tests for RunEngine in `src/engine/tests/` using testcontainers for Redis and PostgreSQL containerization
Applied to files:
packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine
Applied to files:
packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-23T06:24:14.566Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-23T06:24:14.566Z
Learning: Applies to apps/webapp/app/v3/**/*.{ts,tsx} : In the webapp v3 directory, only modify V2 code paths when encountering V1/V2 branching in services - all new work uses Run Engine 2.0 (`internal/run-engine`) and redis-worker, not legacy V1 engine code
Applied to files:
packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk` with a Zod schema for payload validation
Applied to files:
packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `task.trigger()` to trigger a single run of a task from inside another task
Applied to files:
packages/core/src/v3/schemas/runEngine.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled (cron) tasks with either declarative cron patterns or imperative schedule creation
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`
Applied to files:
apps/supervisor/src/env.ts
🔇 Additional comments (2)
packages/core/src/v3/schemas/runEngine.ts (1)
255-264: Nice additive schema update.Making
run.annotationsoptional keeps existing dequeue payloads valid while exposing the structured data the supervisor needs for placement decisions.apps/supervisor/src/index.ts (1)
253-271: Nice boundary here.Forwarding
message.run.annotationsintoworkloadManager.create()keeps Kubernetes-specific scheduling policy insidesrc/workloadManager/.Based on learnings: Container orchestration abstraction (Docker or Kubernetes) should be implemented in
src/workloadManager/.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
apps/supervisor/src/workloadManager/kubernetes.ts (1)
402-416:⚠️ Potential issue | 🟠 MajorScheduled
large-*runs still prefer the large-machine pool.When both affinity features are enabled,
#getAffinity()combines two mutually exclusive preferred terms on the same defaultmachinepoolkey. With the current defaults (100for large-machine affinity vs80for schedule affinity), a scheduledlarge-*run still scores the large-machine pool highest, so it keeps landing there instead of the dedicated schedule pool. Please suppress the large-machine preferred term for scheduled runs and let the schedule preference drive placement. A focused regression test forlarge-*+ scheduled + both affinity flags enabled would lock this down.🎯 Possible fix
`#getAffinity`(opts: WorkloadManagerCreateOptions): k8s.V1Affinity | undefined { - const largeNodeAffinity = this.#getNodeAffinityRules(opts.machine); - const scheduleNodeAffinity = this.#getScheduleNodeAffinityRules(this.#isScheduledRun(opts)); + const isScheduledRun = this.#isScheduledRun(opts); + const largeNodeAffinity = this.#getNodeAffinityRules(opts.machine, isScheduledRun); + const scheduleNodeAffinity = this.#getScheduleNodeAffinityRules(isScheduledRun); const podAffinity = this.#getProjectPodAffinity(opts.projectId); @@ - `#getNodeAffinityRules`(preset: MachinePreset): k8s.V1NodeAffinity | undefined { + `#getNodeAffinityRules`( + preset: MachinePreset, + isScheduledRun: boolean + ): k8s.V1NodeAffinity | undefined { if (!env.KUBERNETES_LARGE_MACHINE_AFFINITY_ENABLED) { return undefined; } if (this.#isLargeMachine(preset)) { + if (isScheduledRun) { + return undefined; + } + // soft preference for the large-machine pool, falls back to standard if unavailable return {Also applies to: 437-459, 480-521
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/supervisor/src/workloadManager/kubernetes.ts` around lines 402 - 416, The combined affinity merge is allowing large-machine preferred terms to compete with schedule preferred terms, so in `#getAffinity` when both largeNodeAffinity (from `#getNodeAffinityRules`) and scheduleNodeAffinity (from `#getScheduleNodeAffinityRules`) exist for a scheduled run (this.#isScheduledRun(opts) true), filter out the largeNodeAffinity.preferredDuringSchedulingIgnoredDuringExecution entries that target the default "machinepool" key before building the preferred array so schedule affinity wins; implement this by excluding those preferred entries (inspect their nodeSelectorTerm.matchExpressions for key "machinepool" or the project’s machinepool key) when scheduleNodeAffinity is present and add a focused regression test covering a large-* run that is scheduled with both affinity flags enabled to ensure the large preferred term is suppressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@apps/supervisor/src/workloadManager/kubernetes.ts`:
- Around line 402-416: The combined affinity merge is allowing large-machine
preferred terms to compete with schedule preferred terms, so in `#getAffinity`
when both largeNodeAffinity (from `#getNodeAffinityRules`) and
scheduleNodeAffinity (from `#getScheduleNodeAffinityRules`) exist for a scheduled
run (this.#isScheduledRun(opts) true), filter out the
largeNodeAffinity.preferredDuringSchedulingIgnoredDuringExecution entries that
target the default "machinepool" key before building the preferred array so
schedule affinity wins; implement this by excluding those preferred entries
(inspect their nodeSelectorTerm.matchExpressions for key "machinepool" or the
project’s machinepool key) when scheduleNodeAffinity is present and add a
focused regression test covering a large-* run that is scheduled with both
affinity flags enabled to ensure the large preferred term is suppressed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 564b823b-2f7b-4ce4-8411-d868e9316dd9
📒 Files selected for processing (2)
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: For apps and internal packages (apps/*,internal-packages/*), usepnpm run typecheck --filter <package>for verification, never usebuildas it proves almost nothing about correctness
Use testcontainers helpers (redisTest,postgresTest,containerTestfrom@internal/testcontainers) for integration tests with Redis and PostgreSQL instead of mocking
When writing Trigger.dev tasks, always import from@trigger.dev/sdk- never use@trigger.dev/sdk/v3or deprecatedclient.defineJob
Files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
**/*.{ts,tsx,js,jsx}: Use pnpm for package management in this monorepo (version 10.23.0) with Turborepo for orchestration - run commands from root withpnpm run
Add crumbs as you write code for debug tracing using//@Crumbscomments or `// `#region` `@crumbsblocks - they stay on the branch throughout development and are stripped viaagentcrumbs stripbefore merge
Files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
apps/supervisor/src/env.ts
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Environment configuration should be defined in
src/env.ts
Files:
apps/supervisor/src/env.ts
apps/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
When modifying only server components (
apps/webapp/,apps/supervisor/, etc.) with no package changes, add a.server-changes/file instead of a changeset
Files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
apps/supervisor/src/workloadManager/**/*.{js,ts}
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Container orchestration abstraction (Docker or Kubernetes) should be implemented in
src/workloadManager/
Files:
apps/supervisor/src/workloadManager/kubernetes.ts
🧠 Learnings (7)
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`
Applied to files:
apps/supervisor/src/env.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/workloadManager/**/*.{js,ts} : Container orchestration abstraction (Docker or Kubernetes) should be implemented in `src/workloadManager/`
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-22T19:27:29.014Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts:104-112
Timestamp: 2026-03-22T19:27:29.014Z
Learning: In `apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts`, the `#scheduleErrorAlertEvaluation` helper intentionally uses the same job id (`evaluateErrorAlerts:${projectId}`) as the evaluator's periodic self-chain. The deduplication is desired: if a future run is already queued, the immediate enqueue becomes a no-op, preventing two evaluations firing in quick succession. Do not flag this as a bug or suggest a unique/timestamped id.
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled (cron) tasks with either declarative cron patterns or imperative schedule creation
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-25T15:29:25.853Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.853Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Specify CPU/RAM requirements using the `machine` option with a `preset` value (e.g., 'large-1x')
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
🔇 Additional comments (2)
apps/supervisor/src/workloadManager/kubernetes.ts (1)
349-353: Good call keeping the newscheduledlabel binary.This gives the affinity logic the routing signal it needs while keeping the added label dimension bounded.
apps/supervisor/src/env.ts (1)
113-117: No changes needed to env.ts configuration.The redesign from
KUBERNETES_LARGE_MACHINE_POOL_LABELto the newKUBERNETES_LARGE_MACHINE_AFFINITY_*structure is actually safer for rollout: both the large-machine and schedule affinity features default toENABLED=false, so existing deployments are unaffected unless explicitly enabled. The structured configuration (key, value, weight, enabled flag) is more explicit than the single-variable predecessor and all values are properly validated. The helm chart supports passing these viaextraEnvVarsif custom configuration is needed.> Likely an incorrect or invalid review comment.
Scheduled runs create predictable hourly spikes that compete with on-demand runs for node capacity. Runs triggered "on-demand" via the SDK, API, or dashboard, are more sensitive to cold start latency since users are typically
waiting on the result. When a burst of scheduled runs lands at the top of the hour, it can saturate the shared pool resources causing contention, affecting cold starts across the board.
The idea in this change is to absorb these periodic spikes in a dedicated pool without affecting the cold starts of on-demand runs. Scheduled runs are inherently less sensitive to cold starts.
Changes in this PR
Follows up on run annotations (#3241), which made trigger origin available on every run in the tree. This PR exposes
annotations at dequeue time to the supervisor. This enables scheduling decisions based on trigger source.
The affinities are soft preferences at schedule time, so runs fall back gracefully if the target pool is out out of capacity.