Skip to content

feat(go): Implement metrics and tracing for http and grpc servers#5925

Merged
shuchu merged 7 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability
Feb 19, 2026
Merged

feat(go): Implement metrics and tracing for http and grpc servers#5925
shuchu merged 7 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability

Conversation

@luisazofracabify
Copy link
Contributor

@luisazofracabify luisazofracabify commented Jan 30, 2026

Description

This PR significantly improves the observability and reliability of the Feast Go Feature Server by implementing comprehensive Prometheus metrics, and robust configuration options for both HTTP and gRPC modes. It addresses issues with hardcoded ports, inconsistent metric exposure, and potential race conditions during shutdown.

Changes

1. Prometheus Metrics Instrumentation

  • Unified Metrics Server: Implemented a dedicated, configurable HTTP server (default port :9090) for serving metrics in both HTTP and gRPC modes. This unifies the observability strategy.
  • HTTP Server Instrumentation: Added metricsMiddleware in internal/feast/server/http_server.go to track:
    • Request duration (http_request_duration_seconds)
    • Request counts by method, path, and status (http_requests_total)
    • Instrumented the /health endpoint for readiness probe visibility.
  • gRPC Metrics: Integrated go-grpc-prometheus to fully expose standard gRPC server metrics.

2. Tracing Enhancements

  • Dynamic Service Name: Improved the existing OpenTelemetry integration by adding support for the OTEL_SERVICE_NAME environment variable (defaults to FeastGoFeatureServer). This allows proper service identification in distributed tracing systems without code changes.

3. Server Configuration & Reliability

  • Configurable Metrics Port: Added -metrics-port flag to main.go.
  • Graceful Shutdown: Implemented sync.WaitGroup in StartHttpServer to ensure clean shutdown of the metrics server and logging services, matching the robust behavior of StartGrpcServer.
  • Code Cleanup: Refactored ServerStarter interfaces and removed unused legacy code.

How Has This Been Tested?

Verification

  • Confirmed /metrics endpoint accessibility on custom ports.
  • Verified /health and request metrics counters.
  • Validated status 200 recording fix.
  • Tested OTEL_SERVICE_NAME env var reflects in traces.
  • Verified clean shutdown logs (SIGINT/SIGTERM).

Checklist

  • My code follows the code style of this project.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have updated the dependencies (go.mod / go.sum).
  • All new and existing tests passed.

Why gotoprom

It reduces boilerplate code for histograms and ensures type-safety for labels, preventing runtime panics due to mismatched label cardinality. It wraps the official prometheus client, so it's fully compatible


Open with Devin

@luisazofracabify luisazofracabify requested a review from a team as a code owner January 30, 2026 11:53
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional flags.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 82aca5b to b876dba Compare January 30, 2026 11:58
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b876dba to 8b5cd4b Compare January 30, 2026 12:30
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 8b5cd4b to 847393e Compare January 30, 2026 12:39
@luisazofracabify luisazofracabify changed the title feat(go): implement metrics and tracing for http and grpc servers feat(go): Implement metrics and tracing for http and grpc servers Feb 2, 2026
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this file. Such a test does not actually test that metrics actually record values.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from 655cd02 to 34de413 Compare February 6, 2026 09:01
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from e7a22fc to 2e4c1b3 Compare February 9, 2026 08:21
devin-ai-integration[bot]

This comment was marked as resolved.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from a5345c2 to f5449af Compare February 9, 2026 08:37
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from db00427 to b9f99a6 Compare February 11, 2026 08:30
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b9f99a6 to 7b552a9 Compare February 11, 2026 12:16
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 7b552a9 to 1bf7277 Compare February 11, 2026 12:21
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 5d6cc58 to 816423b Compare February 11, 2026 13:27
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Collaborator

@shuchu shuchu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shuchu shuchu merged commit 2b4ec9a into feast-dev:master Feb 19, 2026
23 checks passed
@PepeluDev
Copy link
Contributor

@shuchu It seems like this PR removed some packages from the go.sum file, sorry about that. They should be added back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments