Skip to content

libn/networkdb: don't exceed broadcast size limit #49939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

corhere
Copy link
Contributor

@corhere corhere commented May 7, 2025

- What I did
NetworkDB uses a hierarchy of queues to prioritize messages for broadcast. Unfortunately the logic to pull from multiple queues is flawed. The length of the messages pulled from the first queue is not taken into account when pulling messages from the second queue. A list of messages up to tiwce as long as the limit could be returned! Messages beyond the limit will be truncated unceremoniously by memberlist.

Memberlist broadcast queues assume that all messages returned from a GetBroadcasts call will be broadcasted to other nodes in the cluster. Messages are popped from the queue once they have hit their retransmit limit. On a busy system messages may be broadcast fewer times than intended, possibly even being dropped without ever being broadcast!

I fixed this oversight so that the delegate never returns more data than the specified length limit.

- How I did it
Subtract the length of messages pulled from the first queue from the broadcast size limit so the limit is not exceeded when pulling from the second queue.

- How to verify it
By inspection 🤷

- Human readable description for the release notes

Improvements to the reliability and convergence speed of NetworkDB

- A picture of a cute animal (not mandatory but encouraged)

NetworkDB uses a hierarchy of queues to prioritize messages for
broadcast. Unfortunately the logic to pull from multiple queues is
flawed. The length of the messages pulled from the first queue is not
taken into account when pulling messages from the second queue. A list
of messages up to tiwce as long as the limit could be returned! Messages
beyond the limit will be truncated unceremoniously by memberlist.

Memberlist broadcast queues assume that all messages returned from a
GetBroadcasts call will be broadcasted to other nodes in the cluster.
Messages are popped from the queue once they have hit their retransmit
limit. On a busy system messages may be broadcast fewer times than
intended, possibly even being dropped without ever being broadcast!

Subtract the length of messages pulled from the first queue from the
broadcast size limit so the limit is not exceeded when pulling from the
second queue.

Signed-off-by: Cory Snider <[email protected]>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah added this to the 28.2.0 milestone May 9, 2025
@thaJeztah thaJeztah merged commit 97be633 into moby:master May 9, 2025
141 checks passed
@corhere corhere deleted the libn/networkdb-broadcast-overflow branch May 9, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants