[monitoring] Add per-tenant log routing support in fluent-bit#2195
[monitoring] Add per-tenant log routing support in fluent-bit#2195mattia-eleuteri wants to merge 1 commit intocozystack:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughChanges update Fluent Bit configuration rendered from Changes
Sequence Diagram(s)sequenceDiagram
participant Pod as App Pod
participant FB as Fluent Bit
participant VInsert as vlinsert-generic (tenant)
Pod->>FB: emit logs (kube.* tags)
FB->>FB: FILTER rewrite_tag (per-namespace) -> retag to tenant.{target}.{namespace}.<stream>
FB->>FB: FILTER modify (Match *) -> add tenant {{ .Values.global.target }}
FB->>VInsert: OUTPUT Match tenant.{target}.* -> HTTP/JSON POST to vlinsert-generic.{target}.svc:9481
VInsert-->>FB: 200 OK
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/system/monitoring-agents/values.yaml (1)
382-394: Consider adding input validation for tenantLogRouting entries.If a user provides a
tenantLogRoutingentry without the requirednamespaceorvlogsfields, the template will generate an invalid fluent-bit config that could cause deployment failures with unhelpful error messages.🛡️ Suggested validation approach
Add validation at the start of the range loop:
{{- range .Values.tenantLogRouting }} + {{- if not .namespace }} + {{- fail "tenantLogRouting entry missing required 'namespace' field" }} + {{- end }} + {{- if not .vlogs }} + {{- fail "tenantLogRouting entry missing required 'vlogs' field" }} + {{- end }} [OUTPUT]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/monitoring-agents/values.yaml` around lines 382 - 394, The template iterates over .Values.tenantLogRouting without checking required fields, which can produce invalid fluent-bit OUTPUT blocks when namespace or vlogs are missing; update the range loop over tenantLogRouting to validate each entry by testing that .namespace and .vlogs are non-empty (e.g., using conditional/if checks) and either skip the entry or emit a clear failure/warning message before rendering the OUTPUT stanza for that item; reference the existing range over .Values.tenantLogRouting and the fields .namespace and .vlogs when implementing the checks so only entries with both fields produce the Name http / Match / Host / header blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/system/monitoring-agents/values.yaml`:
- Around line 444-449: The Rule line in the Helm template that builds the
rewrite_tag filter interpolates .Values.tenantLogRouting[].namespace directly
into a regex (the "Rule $kubernetes_namespace_name ^({{ .namespace }})$
tenant.{{ .namespace }}.$TAG false"), which can break when namespaces contain
regex metacharacters; update that template to escape the namespace using Helm's
regexQuoteMeta (or equivalent) when constructing both the capture group and the
replacement token so the generated regex is literal and safe, ensuring you
reference the template block iterating over .Values.tenantLogRouting and the
rewrite_tag Rule to locate and change the interpolation.
---
Nitpick comments:
In `@packages/system/monitoring-agents/values.yaml`:
- Around line 382-394: The template iterates over .Values.tenantLogRouting
without checking required fields, which can produce invalid fluent-bit OUTPUT
blocks when namespace or vlogs are missing; update the range loop over
tenantLogRouting to validate each entry by testing that .namespace and .vlogs
are non-empty (e.g., using conditional/if checks) and either skip the entry or
emit a clear failure/warning message before rendering the OUTPUT stanza for that
item; reference the existing range over .Values.tenantLogRouting and the fields
.namespace and .vlogs when implementing the checks so only entries with both
fields produce the Name http / Match / Host / header blocks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8ffb4485-2409-4778-bcf4-7be38544caca
📒 Files selected for processing (1)
packages/system/monitoring-agents/values.yaml
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses the current limitation where Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a new Fluent Bit configuration for per-tenant log routing, allowing logs from specific namespaces to be routed to dedicated VLogs services. This involves dynamically generating HTTP output and rewrite tag filter blocks based on user-defined tenantLogRouting values. However, the current implementation has several security vulnerabilities: the .vlogs parameter, used as an HTTP output host, is user-controlled and creates a Server-Side Request Forgery (SSRF) risk; the .namespace parameter, embedded in a regular expression for tag rewriting, is user-controlled and leads to a Regular Expression Denial of Service (ReDoS) vulnerability; and the .namespace parameter, embedded in glob patterns for log matching, is user-controlled and creates a log routing injection vulnerability. Remediation requires strict validation and escaping of these user-controlled parameters, along with refactoring duplicated output configurations for better maintainability.
| {{- if not .namespace }}{{ fail "tenantLogRouting: namespace is required" }}{{- end }} | ||
| [OUTPUT] | ||
| Name http | ||
| Match tenant.{{ .namespace }}.* |
There was a problem hiding this comment.
The .vlogs parameter is directly used as the Host for an HTTP output, which is a user-controlled value. This creates a Server-Side Request Forgery (SSRF) vulnerability, allowing an attacker to specify an arbitrary hostname for Fluent Bit to send logs, potentially scanning internal networks or exfiltrating data. Implement strict validation and allow-listing for .vlogs to restrict it to trusted hosts. Additionally, the configuration for this [OUTPUT] block is duplicated, increasing maintenance overhead. Consider refactoring for future maintainability.
There was a problem hiding this comment.
This is a Helm values.yaml template — .vlogs is set by the platform operator during tenant provisioning, not by end users. There is no user-controlled input path here, so SSRF is not applicable.
| {{- range .Values.tenantLogRouting }} | ||
| [FILTER] | ||
| Name rewrite_tag | ||
| Match kube.* |
There was a problem hiding this comment.
The .namespace parameter is directly embedded into a regular expression within the Rule field of a rewrite_tag filter. This user-controlled value creates a Regular Expression Denial of Service (ReDoS) vulnerability, allowing an attacker to inject malicious regex patterns that could consume excessive CPU and lead to a denial of service. Implement strict validation and escape special characters for .namespace. Additionally, the current regex ^({{ .namespace }})$ uses an unnecessary capturing group; simplifying it could improve clarity and avoid minor overhead.
Rule $kubernetes_namespace_name ^{{ .namespace }}$ tenant.{{ .namespace }}.$TAG falseThere was a problem hiding this comment.
Fixed — regexQuoteMeta is now applied to escape the namespace value before it's used in the regex pattern.
| {{- range .Values.tenantLogRouting }} | ||
| {{- if not .namespace }}{{ fail "tenantLogRouting: namespace is required" }}{{- end }} | ||
| [OUTPUT] | ||
| Name http |
There was a problem hiding this comment.
The .namespace parameter is directly embedded into the Match field of an [OUTPUT] block, which uses glob patterns. An attacker controlling this value could inject glob pattern wildcards (*, ?, []) to broaden log matching criteria.
Impact: This could lead to misrouting of logs, potential information leakage between tenants if a malicious pattern matches other tenants' logs, or a denial of service by causing Fluent Bit to process an excessive amount of logs.
Remediation: Implement strict validation for the .namespace parameter to ensure it only contains valid and expected namespace names. Avoid allowing wildcard characters or other special glob pattern characters in user-supplied namespace values.
There was a problem hiding this comment.
The Match field uses fluent-bit tag matching, not arbitrary glob. The namespace value is set by the platform during provisioning, not by tenant users, so glob injection is not a concern here.
6eb1707 to
70274a7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/system/monitoring-agents/values.yaml`:
- Around line 382-387: The template validates .namespace but not .vlogs, so
missing vlogs silently defaults Host to localhost; update the tenantLogRouting
loop in values.yaml to check for .vlogs (e.g., add {{- if not .vlogs }}{{ fail
"tenantLogRouting: vlogs is required for namespace <namespace>" }}{{- end }})
before rendering the [OUTPUT] block so rendering fails fast when .vlogs is
omitted; keep the existing Host {{ .vlogs }} usage and include the namespace in
the failure message to aid debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 92ca9944-7a8d-42fc-9f26-3d5af186d017
📒 Files selected for processing (1)
packages/system/monitoring-agents/values.yaml
70274a7 to
28a6826
Compare
kvaps
left a comment
There was a problem hiding this comment.
Thanks for tackling this! The problem is real — tenant VLogs instances receiving no logs is a gap we need to close.
However, I think the manual tenantLogRouting approach has some drawbacks:
- No automation — someone has to manually add entries to
values.yamlevery time a tenant with monitoring is created/deleted. This is error-prone and doesn't scale. - No sub-tenant support — nested tenants won't get their logs routed automatically.
- Diverges from the existing pattern — vmagent already solves the equivalent problem for metrics automatically.
How vmagent does it (for reference)
- Each tenant namespace gets a label
namespace.cozystack.io/monitoring: <monitoring-namespace>(namespace.yaml) - The
cozystack-valuesSecret injects_namespace.monitoringinto every tenant, pointing to the correct monitoring stack - The monitoring-agents HelmRelease for each Kubernetes cluster uses
$targetTenant := .Values._namespace.monitoringto route both metrics and logs to the right destination automatically
For child Kubernetes clusters, fluent-bit already sends all logs to vlinsert-generic.{{ $targetTenant }}.svc — so logs already work automatically there. The gap is only on the root cluster, where all namespace logs go to tenant-root without per-tenant splitting.
Suggested approach
Instead of a manual list in monitoring-agents values, consider generating the fluent-bit routing config at the tenant chart level (packages/apps/tenant/templates/), using the same _namespace.monitoring value that vmagent already relies on. This way:
- Tenant creation automatically sets up log routing
- Sub-tenant inheritance works out of the box (child tenants inherit
_namespace.monitoringfrom parent) - The pattern stays consistent with how metrics collection already works
For reference, PR #1684 added per-tenant vmagent with automatic wiring — the same pattern should be followed for log routing.
28a6826 to
29a4a64
Compare
|
@kvaps Thanks for the detailed review and the pointer to how vmagent handles this! I've reworked the implementation to follow the exact same automatic pattern. Here's what changed: Approach: Instead of a manual How it works:
Also fixed: Default outputs now correctly use Edge cases handled:
Only |
29a4a64 to
da4c3ac
Compare
Replace manual tenantLogRouting list with automatic discovery using Helm lookup on namespace.cozystack.io/monitoring labels, matching the existing vmagent pattern for metrics routing. - Use lookup "v1" "Namespace" to discover tenant namespaces - Generate rewrite_tag filters and HTTP outputs dynamically - Deduplicate targets (multiple namespaces -> one output per target) - Fix default outputs to use vlinsert-generic:9481 (VLCluster) - Remove manual tenantLogRouting configuration Signed-off-by: mattia-eleuteri <[email protected]> Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
da4c3ac to
1e8f354
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/system/monitoring-agents/values.yaml (1)
442-459: Consider order ofmodifyandrewrite_tagfilters for tenant field accuracy.The
modifyfilter at line 442-445 addstenant {{ .Values.global.target }}to all logs (Match *) before therewrite_tagfilters process them. This means logs routed to tenant-specific VLogs instances will still havetenant: cozy-monitoring(or whateverglobal.targetis) rather than the actual tenant target.If tenants should see their own tenant name in the
tenantfield, consider either:
- Moving the tenant modify filter after rewrite_tag and adjusting the match patterns, or
- Adding tenant-specific modify filters after rewrite_tag
This may be intentional if the tenant field is meant for platform-level identification, or could be addressed in the follow-up PR
#2196(metadata redaction).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/monitoring-agents/values.yaml` around lines 442 - 459, The current global modify filter (Name modify that adds tenant {{ .Values.global.target }}) is applied before the rewrite_tag rules, causing tenant-specific routing to keep the global tenant value; update the config so tenant is set after tags are rewritten: either move the global "Name modify Add tenant {{ .Values.global.target }}" block below the rewrite_tag blocks and narrow its Match to only platform-level logs, or add per-namespace tenant-specific modify filters immediately after the rewrite_tag Rule (matching the rewritten tag pattern such as tenant.{{ $monTarget }}.*) and set Add tenant to the corresponding {{ $monTarget }}; adjust Match patterns accordingly so the modify that sets tenant runs after the rewrite_tag transformation (refer to the existing [FILTER] blocks, the Name modify filter and the Name rewrite_tag filter with Rule $kubernetes_namespace_name ^({{ $ns.metadata.name }})$ tenant.{{ $monTarget }}.{{ $ns.metadata.name }}.$TAG).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/system/monitoring-agents/values.yaml`:
- Around line 347-348: Add a missing ExternalName Service entry for
vlinsert-generic in the monitoring-external-services.yaml template: create an
ExternalName service resource named vlinsert-generic that mirrors the existing
vlogs-generic entry but points to vlinsert-generic.tenant-root.svc.cluster.local
and exposes port 9481 so Fluent Bit outputs (references to vlinsert-generic:9481
in values.yaml) can resolve; update the template where other services are
defined (use the vlogs-generic Service as the pattern) and ensure the same
entries are added for the other occurrences noted (lines near 358-359, 369-370,
392).
---
Nitpick comments:
In `@packages/system/monitoring-agents/values.yaml`:
- Around line 442-459: The current global modify filter (Name modify that adds
tenant {{ .Values.global.target }}) is applied before the rewrite_tag rules,
causing tenant-specific routing to keep the global tenant value; update the
config so tenant is set after tags are rewritten: either move the global "Name
modify Add tenant {{ .Values.global.target }}" block below the rewrite_tag
blocks and narrow its Match to only platform-level logs, or add per-namespace
tenant-specific modify filters immediately after the rewrite_tag Rule (matching
the rewritten tag pattern such as tenant.{{ $monTarget }}.*) and set Add tenant
to the corresponding {{ $monTarget }}; adjust Match patterns accordingly so the
modify that sets tenant runs after the rewrite_tag transformation (refer to the
existing [FILTER] blocks, the Name modify filter and the Name rewrite_tag filter
with Rule $kubernetes_namespace_name ^({{ $ns.metadata.name }})$ tenant.{{
$monTarget }}.{{ $ns.metadata.name }}.$TAG).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5220f786-1b26-476a-87f9-623825339d61
📒 Files selected for processing (1)
packages/system/monitoring-agents/values.yaml
| Host vlinsert-generic.{{ .Values.global.target }}.svc | ||
| port 9481 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for vlinsert-generic service definitions in the codebase
echo "=== Searching for vlinsert-generic service definitions ==="
rg -n "vlinsert-generic" --type yaml
echo ""
echo "=== Checking monitoring package templates ==="
fd -e yaml . packages/system/monitoring/templates -x grep -l "vlinsert" {} \;Repository: cozystack/cozystack
Length of output: 610
🏁 Script executed:
#!/bin/bash
# Search for all external service definitions
echo "=== Checking for external services definitions ==="
rg -n "ExternalName" --type yaml -A 3 | head -50
echo ""
echo "=== Looking for vlogs references ==="
rg -n "vlogs" --type yaml | grep -i service
echo ""
echo "=== Checking monitoring-external-services.yaml ==="
find . -name "monitoring-external-services.yaml" -exec cat {} \;Repository: cozystack/cozystack
Length of output: 9071
🏁 Script executed:
#!/bin/bash
# Check Victoria Logs configuration and services
echo "=== Searching for victoria logs related configs ==="
fd -name "*.yaml" | xargs grep -l "victoria" | head -20
echo ""
echo "=== Check if vlinsert is mentioned in any documentation ==="
fd -name "*.md" | xargs grep -l "vlinsert" 2>/dev/null || echo "No vlinsert in markdown files"Repository: cozystack/cozystack
Length of output: 684
🏁 Script executed:
#!/bin/bash
# Search for victoria-logs package
echo "=== Finding victoria-logs related files ==="
find . -type d -name "*victoria*logs*" | head -20
echo ""
echo "=== Searching for any vlinsert mention (case-insensitive) ==="
rg -i "vlinsert" --type yaml
echo ""
echo "=== Check monitoring package structure ==="
ls -la packages/system/monitoring/templates/ | head -30
echo ""
echo "=== Check if monitoring package defines any services ==="
fd -e yaml . packages/system/monitoring/templates -exec basename {} \;Repository: cozystack/cozystack
Length of output: 2170
🏁 Script executed:
#!/bin/bash
# Check vlogs configuration in monitoring package
echo "=== Content of vlogs.yaml ==="
cat packages/system/monitoring/templates/vlogs/vlogs.yaml
echo ""
echo "=== Content of vlogs grafana-datasource.yaml ==="
cat packages/system/monitoring/templates/vlogs/grafana-datasource.yamlRepository: cozystack/cozystack
Length of output: 1146
🏁 Script executed:
#!/bin/bash
# Check monitoring-agents deployment/values
echo "=== Check monitoring-agents to understand service discovery ==="
head -100 packages/system/monitoring-agents/values.yaml
echo ""
echo "=== Search for VLogs CRD definitions ==="
rg -i "kind.*vlogs" --type yaml
echo ""
echo "=== Check if VLogs operator creates services automatically ==="
find . -name "*.yaml" -o -name "*.md" | xargs grep -l "VLogs.*service" 2>/dev/null | head -5Repository: cozystack/cozystack
Length of output: 4352
🏁 Script executed:
#!/bin/bash
# Check VLogs CRD to understand service creation
echo "=== Searching for VLogs CRD service-related fields ==="
rg -A 5 -B 5 "vlinsert\|service" packages/system/victoria-metrics-operator/charts/victoria-metrics-operator/crd.yaml | head -100
echo ""
echo "=== Check if there's any documentation about VLogs services ==="
find . -name "*.md" -o -name "README*" | xargs grep -l "VLogs\|vlinsert" 2>/dev/null | head -5Repository: cozystack/cozystack
Length of output: 182
🏁 Script executed:
#!/bin/bash
# Look at the full diff context around the changes
echo "=== Check git diff for this file if available ==="
git log --oneline packages/system/monitoring-agents/values.yaml 2>/dev/null | head -5
echo ""
echo "=== Check recent changes context ==="
git show HEAD:packages/system/cozystack-basics/templates/monitoring-external-services.yaml 2>/dev/null || echo "Not available"Repository: cozystack/cozystack
Length of output: 783
🏁 Script executed:
#!/bin/bash
# Check the actual changes in monitoring-agents/values.yaml
echo "=== Context around lines 340-400 in monitoring-agents/values.yaml ==="
sed -n '340,400p' packages/system/monitoring-agents/values.yaml | cat -n
echo ""
echo "=== Check what the OUTPUT is for ==="
sed -n '330,350p' packages/system/monitoring-agents/values.yaml | cat -nRepository: cozystack/cozystack
Length of output: 4108
🏁 Script executed:
#!/bin/bash
# Final verification: check if VLogs operator would create vlinsert services
echo "=== Search for any indication that vlinsert services are created ==="
rg -i "vlinsert" . --type yaml
echo ""
echo "=== Check if there's a pattern for how services are named in monitoring ==="
rg -i "port.*948|9428" . --type yaml | grep -v node_modules | head -20
echo ""
echo "=== Verify the monitoring namespace and how external services work ==="
grep -r "cozy-monitoring" packages/system/cozystack-basics/templates/monitoring-external-services.yamlRepository: cozystack/cozystack
Length of output: 1308
Add vlinsert-generic ExternalName service to monitoring-external-services.yaml.
Fluent Bit outputs (lines 347-348 and others) send logs to vlinsert-generic:9481 (VictoriaLogs insert endpoint), but this service is not defined in packages/system/cozystack-basics/templates/monitoring-external-services.yaml. Without it, logs will fail to be inserted into VictoriaLogs. Add the missing ExternalName service mapping to vlinsert-generic.tenant-root.svc.cluster.local similar to the existing vlogs-generic service.
Also applies to: 358-359, 369-370, 392
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/monitoring-agents/values.yaml` around lines 347 - 348, Add a
missing ExternalName Service entry for vlinsert-generic in the
monitoring-external-services.yaml template: create an ExternalName service
resource named vlinsert-generic that mirrors the existing vlogs-generic entry
but points to vlinsert-generic.tenant-root.svc.cluster.local and exposes port
9481 so Fluent Bit outputs (references to vlinsert-generic:9481 in values.yaml)
can resolve; update the template where other services are defined (use the
vlogs-generic Service as the pattern) and ensure the same entries are added for
the other occurrences noted (lines near 358-359, 369-370, 392).
Summary
tenantLogRoutingparameter tomonitoring-agentsfluent-bit configrewrite_tagfilters to match logs bykubernetes_namespace_nameand re-tag them astenant.<namespace>.*Problem
Currently,
monitoring-agentsfluent-bit sends all container logs tovlogs-generic.<target>.svc(typicallytenant-root). Tenants that deploy their own monitoring stack get a VLogs instance but it receives no logs.Workaround: Manual patch of the fluent-bit ConfigMap with
rewrite_tag+ per-tenant OUTPUT blocks.Usage
Closes #2194 (partially)
Test plan
helm templatemonitoring-agents withtenantLogRoutingpopulated🤖 Generated with Claude Code
Summary by CodeRabbit