Skip to content

libnetwork/networkdb: always shut down memberlist #50115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

corhere
Copy link
Contributor

@corhere corhere commented May 30, 2025

- What I did
- How I did it
Gracefully leaving the memberlist cluster is a best-effort operation. Failing to successfully broadcast the leave message to a peer should not prevent NetworkDB from cleaning up the memberlist instance on close. But that was not the case in practice. Log the error returned from (*memberlist.Memberlist).Leave instead of returning it and proceed with shutting down irrespective of whether Leave() returns an error.

- How to verify it
By inspection.

- Human readable description for the release notes

Fix a potential resource leak when a node leaves a Swarm

- A picture of a cute animal (not mandatory but encouraged)

Gracefully leaving the memberlist cluster is a best-effort operation.
Failing to successfully broadcast the leave message to a peer should not
prevent NetworkDB from cleaning up the memberlist instance on close. But
that was not the case in practice. Log the error returned from
(*memberlist.Memberlist).Leave instead of returning it and proceed with
shutting down irrespective of whether Leave() returns an error.

Signed-off-by: Cory Snider <[email protected]>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that memberlist is always shut down by logging leave errors instead of returning them, preventing resource leaks when a node leaves the cluster.

  • Switched mlist.Leave failure handling from returning an error to logging it and proceeding with shutdown
  • Updated sendNodeEvent and mlist.Leave logs to use structured WithError style
Comments suppressed due to low confidence (2)

libnetwork/networkdb/cluster.go:231

  • Add a unit or integration test to verify that even if mlist.Leave returns an error, the rest of clusterLeave still executes and memberlist is cleaned up.
log.G(context.TODO()).WithError(err).Error("failed to broadcast memberlist leave message")

libnetwork/networkdb/cluster.go:231

  • [nitpick] Consider using consistent terminology for leave operations (e.g., "leave event" vs. "leave message") to avoid confusion in logs.
log.G(context.TODO()).WithError(err).Error("failed to broadcast memberlist leave message")

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah modified the milestones: 29.0.0, 28.2.3 Jun 2, 2025
@vvoland vvoland modified the milestones: 28.2.3, 28.3.0 Jun 11, 2025
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the CI failures are unrelated (buildkit tests) and I can't easily restart them (because the artifacts already expired) so let me just merge the PR.

@vvoland vvoland merged commit c430c9c into moby:master Jun 12, 2025
199 of 203 checks passed
@corhere corhere deleted the libn/fix-47859-networkdb-clusterleave-leak branch June 12, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A potential goleak in cluster.go
4 participants