-
Notifications
You must be signed in to change notification settings - Fork 18.8k
libnetwork/networkdb: always shut down memberlist #50115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnetwork/networkdb: always shut down memberlist #50115
Conversation
Gracefully leaving the memberlist cluster is a best-effort operation. Failing to successfully broadcast the leave message to a peer should not prevent NetworkDB from cleaning up the memberlist instance on close. But that was not the case in practice. Log the error returned from (*memberlist.Memberlist).Leave instead of returning it and proceed with shutting down irrespective of whether Leave() returns an error. Signed-off-by: Cory Snider <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR ensures that memberlist is always shut down by logging leave errors instead of returning them, preventing resource leaks when a node leaves the cluster.
- Switched
mlist.Leave
failure handling from returning an error to logging it and proceeding with shutdown - Updated
sendNodeEvent
andmlist.Leave
logs to use structuredWithError
style
Comments suppressed due to low confidence (2)
libnetwork/networkdb/cluster.go:231
- Add a unit or integration test to verify that even if
mlist.Leave
returns an error, the rest ofclusterLeave
still executes and memberlist is cleaned up.
log.G(context.TODO()).WithError(err).Error("failed to broadcast memberlist leave message")
libnetwork/networkdb/cluster.go:231
- [nitpick] Consider using consistent terminology for leave operations (e.g., "leave event" vs. "leave message") to avoid confusion in logs.
log.G(context.TODO()).WithError(err).Error("failed to broadcast memberlist leave message")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the CI failures are unrelated (buildkit tests) and I can't easily restart them (because the artifacts already expired) so let me just merge the PR.
- What I did
- How I did it
Gracefully leaving the memberlist cluster is a best-effort operation. Failing to successfully broadcast the leave message to a peer should not prevent NetworkDB from cleaning up the memberlist instance on close. But that was not the case in practice. Log the error returned from (*memberlist.Memberlist).Leave instead of returning it and proceed with shutting down irrespective of whether Leave() returns an error.
- How to verify it
By inspection.
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)