Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: moby/moby
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v25.0.10
Choose a base ref
...
head repository: moby/moby
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v25.0.11
Choose a head ref
  • 11 commits
  • 8 files changed
  • 1 contributor

Commits on May 15, 2025

  1. libn/networkdb: advertise the configured bind port

    The NetworkDB unit tests instantiate clusters which communicate over
    loopback where every "node" listens on a distinct localhost port. The
    tests make use of a NetworkDB configuration knob to set the port. When
    the NetworkDB configuration's BindPort field is set to a nonzero value,
    its memberlist instance is configured to bind to the specified port
    number. However, the advertise port is left at the
    memberlist.DefaultLANConfig() default value of 7946. Because of this,
    nodes would be unable to contact any of the other nodes in the cluster
    learned by gossip as the gossiped addresseses specify the wrong ports!
    The flaky tests passed as often as they did thanks to the robustness of
    the memberlist module: NetworkDB gossip and and memberlist node
    liveness-probe pings to unreachable nodes can all be relayed through
    the reachable nodes, the nodes on the bootstrap join list.
    
    Make the NetworkDB unit tests less flaky by setting each node's
    advertise port to the bind port.
    
    The daemon is unaffected by this oversight as it unconditionally uses
    the default listen port of 7946, which aligns with the advertise port.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit e3f9edd)
    Signed-off-by: Drew Erny <[email protected]>
    corhere authored and dperny committed May 15, 2025
    Configuration menu
    Copy the full SHA
    e5e7d89 View commit details
    Browse the repository at this point in the history
  2. libn/networkdb: listen only on loopback in tests

    NetworkDB defaults to binding to the unspecified address for gossip
    communications, with no advertise address set. In this configuration,
    the memberlist instance listens on all network interfaces and picks one
    of the host's public IP addresses as the advertise address.
    The NetworkDB unit tests don't override this default, leaving them
    vulnerable to flaking out as a result of rogue network traffic
    perturbing the test, or the inferred advertise address not being useable
    for loopback testing. And macOS prompts for permission to allow the test
    executable to listen on public interfaces every time it is rebuilt.
    
    Modify the NetworkDB tests to explicitly bind to, advertise, and join
    ports on 127.0.0.1 to make the tests more robust to flakes in CI and
    more convenient to run locally.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit 90ec2c2)
    Signed-off-by: Drew Erny <[email protected]>
    corhere authored and dperny committed May 15, 2025
    Configuration menu
    Copy the full SHA
    c31faae View commit details
    Browse the repository at this point in the history
  3. libn/networkdb: don't exceed broadcast size limit

    NetworkDB uses a hierarchy of queues to prioritize messages for
    broadcast. Unfortunately the logic to pull from multiple queues is
    flawed. The length of the messages pulled from the first queue is not
    taken into account when pulling messages from the second queue. A list
    of messages up to tiwce as long as the limit could be returned! Messages
    beyond the limit will be truncated unceremoniously by memberlist.
    
    Memberlist broadcast queues assume that all messages returned from a
    GetBroadcasts call will be broadcasted to other nodes in the cluster.
    Messages are popped from the queue once they have hit their retransmit
    limit. On a busy system messages may be broadcast fewer times than
    intended, possibly even being dropped without ever being broadcast!
    
    Subtract the length of messages pulled from the first queue from the
    broadcast size limit so the limit is not exceeded when pulling from the
    second queue.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit dacf445)
    Signed-off-by: Drew Erny <[email protected]>
    corhere authored and dperny committed May 15, 2025
    Configuration menu
    Copy the full SHA
    00d8bed View commit details
    Browse the repository at this point in the history
  4. Fix possible overlapping IPs when ingressNA == nil

    Logic was added to the Swarm executor in commit 0d9b0ed
    to clean up managed networks whenever the node's load-balancer IP
    address is removed or changed in order to free up the address in the
    case where the container fails to start entirely. Unfortunately, due to
    an oversight the function returns early if the Swarm is lacking
    an ingress network. Remove the early return so that load-balancer IP
    addresses for all the other networks are freed as appropriate,
    irrespective of whether an ingress network exists in the Swarm.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit 56ad941)
    Signed-off-by: Drew Erny <[email protected]>
    corhere authored and dperny committed May 15, 2025
    Configuration menu
    Copy the full SHA
    5c19265 View commit details
    Browse the repository at this point in the history
  5. libn/networkdb: SetPrimaryKey() under a write lock

    (*NetworkDB).SetPrimaryKey() acquires a read lock on the NetworkDB
    instance. That seems sound on the surface as it is only reading from the
    NetworkDB struct, not mutating it. However, concurrent calls to
    (*memberlist.Keyring).UseKey() would get flagged by Go's race detector
    due to some questionable locking in its implementation. Acquire an
    exclusive lock in SetPrimaryKey so concurrent calls don't race each
    other.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit c9b01e0)
    Signed-off-by: Drew Erny <[email protected]>
    corhere authored and dperny committed May 15, 2025
    Configuration menu
    Copy the full SHA
    9c1b0fb View commit details
    Browse the repository at this point in the history
  6. libn/networkdb: fix data race in GetTableByNetwork

    The function was accessing the index map without holding the mutex, so
    it would race any mutation to the database indexes. Fetch the reference
    to the tree's root while holding a read lock. Since the radix tree is
    immutable, taking a reference to the root is equivalent to starting a
    read-only database transaction, providing a consistent view of the data
    at a snapshot in time, even as the live state is mutated concurrently.
    
    Also optimize the WalkTable function by leveraging the immutability of
    the radix tree.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit ec65f2d)
    corhere committed May 15, 2025
    Configuration menu
    Copy the full SHA
    5d123a0 View commit details
    Browse the repository at this point in the history
  7. libn/networkdb: b'cast watch events from local POV

    NetworkDB gossips changes to table entries to other nodes using distinct
    CREATE, UPDATE and DELETE events. It is unfortunate that the wire
    protocol distinguishes CREATEs from UPDATEs as nothing useful can be
    done with this information. Newer events for an entry invalidate older
    ones, so there is no guarantee that a CREATE event is broadcast to any
    node before an UPDATE is broadcast. And due to the nature of gossip
    protocols, even if the CREATE event is broadcast from the originating
    node, there is no guarantee that any particular node will receive the
    CREATE before an UPDATE. Any code which handles an UPDATE event
    differently from a CREATE event is therefore going to behave in
    unexpected ways in less than perfect conditions.
    
    NetworkDB table watchers also receive CREATE, UPDATE and DELETE events.
    Since the watched tables are local to the node, the events could all
    have well-defined meanings that are actually useful. Unfortunately
    NetworkDB is just bubbling up the wire-protocol event types to the
    watchers. Redefine the table-watch events such that a CREATE event is
    broadcast when an entry pops into existence in the local NetworkDB, an
    UPDATE event is broadcast when an entry which was already present in the
    NetworkDB state is modified, and a DELETE event is broadcast when an
    entry which was already present in the NetworkDB state is marked for
    deletion. DELETE events are broadcast with the same value as the most
    recent CREATE or UPDATE event for the entry.
    
    The handler for endpoint table events in the libnetwork agent assumed
    incorrectly that CREATE events always correspond to adding a new active
    endpoint and that UPDATE events always correspond to disabling an
    endpoint. Fix up the handler to handle CREATE and UPDATE events using
    the same code path, checking the table entry's ServiceDisabled flag to
    determine which action to take.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit c68671d)
    Signed-off-by: Cory Snider <[email protected]>
    corhere committed May 15, 2025
    Configuration menu
    Copy the full SHA
    e78b2bd View commit details
    Browse the repository at this point in the history
  8. libn/networkdb: record tombstones for all deletes

    The gossip protocol which powers NetworkDB does not guarantee in-order
    reception of events. This poses a problem with deleting entries: without
    some mechanism to discard stale CREATE or UPDATE events received after a
    DELETE, out-of-order reception of events could result in a deleted entry
    being spuriously resurrected in the local NetworkDB state! NetworkDB
    handles this situation by storing "tombstone" entries for a period of
    time with the Lamport timestamps of the entries' respective DELETE
    events. Out-of-order CREATE or UPDATE events will be ignored by virtue
    of having older timestmaps than the tombstone entry, just like how it
    works for entries that have not yet been deleted.
    
    NetworkDB was only storing a tombstone if the entry was already present
    in the local database at the time of the DELETE event. If the first
    event received for an entry is a DELETE, no tombstone is stored. If a
    stale CREATE/UPDATE event for the entry (with an older timestamp than
    the DELETE) is subsequently received, NetworkDB erroneously creates a
    live entry in the local state with stale data. Modify NetworkDB to store
    tombstones for DELETE events irrespective of whether the entry was known
    to NetworkDB beforehand so that it correctly discards out-of-order
    CREATEs and UPDATEs in all cases.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit ada8bc3)
    Signed-off-by: Cory Snider <[email protected]>
    corhere committed May 15, 2025
    Configuration menu
    Copy the full SHA
    bc97de4 View commit details
    Browse the repository at this point in the history
  9. libn/networkdb: Watch() without race conditions

    NetworkDB's Watch() facility is problematic to use in practice. The
    stream of events begins when the watch is started, so the watch cannot
    be used to process table entries that existed beforehand. Either option
    to process existing table entries is racy: walking the table before
    starting the watch leaves a race window where events could be missed,
    and walking the table after starting the watch leaves a race window
    where created/updated entries could be processed twice.
    
    Modify Watch() to initialize the channel with synthetic CREATE events
    for all existing entries owned by remote nodes before hooking it up to
    the live event stream. This way watchers observe an equivalent sequence
    of events irrespective of whether the watch was started before or after
    entries from remote nodes are added to the database. Remove the bespoke
    and racy synthetic event replay logic for driver watches from the
    libnetwork agent.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit a3aea15)
    Signed-off-by: Cory Snider <[email protected]>
    corhere committed May 15, 2025
    Configuration menu
    Copy the full SHA
    545e84c View commit details
    Browse the repository at this point in the history
  10. libn/networkdb: stop table events from racing network leaves

    When a node leaves a network or the cluster, or memberlist considers the
    node as failed, NetworkDB atomically deletes all table entries (for the
    left network) owned by the node. This maintains the invariant that table
    entries owned by a node are present in the local database indices iff
    that node is an active cluster member which is participating in the
    network the entries pertain to.
    
    (*NetworkDB).handleTableEvent() is written in a way which attempts to
    minimize the amount of time it is in a critical section with the mutex
    locked for writing. It first checks under a read-lock whether both the
    local node and the node where the event originated are participating in
    the network which the event pertains to. If the check passes, the mutex
    is unlocked for reading and locked for writing so the local database
    state is mutated in a critical section. That leaves a window of time
    between the participation check the write-lock being acquired for a
    network or node event to arrive and be processed. If a table event for a
    node+network races a node or network event which triggers the purge of
    all table entries for the same node+network, the invariant could be
    violated. The table entry described by the table event may be reinserted
    into the local database state after being purged by the node's leaving,
    resulting in an orphaned table entry which the local node will bulk-sync
    to other nodes indefinitely.
    
    It's not completely wrong to perform a pre-flight check outside of the
    critical section. It allows for an early return in the no-op case
    without having to bear the cost of synchronization. But such optimistic
    concurrency control is only sound if the condition is double-checked
    inside the critical section. It is tricky to get right, and this
    instance of optimistic concurrency control smells like a case of
    premature optimization. Move the pre-flight check into the critical
    section to ensure that the invariant is maintained.
    
    Signed-off-by: Cory Snider <[email protected]>
    (cherry picked from commit 270a4d4)
    Signed-off-by: Cory Snider <[email protected]>
    corhere committed May 15, 2025
    Configuration menu
    Copy the full SHA
    b7346c5 View commit details
    Browse the repository at this point in the history
  11. Merge pull request #50005 from dperny/25.0-backport-network-fixes

    [25.0] Backport network fixes
    corhere authored May 15, 2025
    Configuration menu
    Copy the full SHA
    d4f1fb1 View commit details
    Browse the repository at this point in the history
Loading